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Ron  Howard  solvedthe'  Markov  decision  problem  with  perfect  knowledge  of  all 
the  transition  probabilities  and  reward,  in  a  practical  situation,  the  transition 
probabilities  may  not  be  known  exactly.  Therefore,  the  problem  this  research  attacks 
is  the  Markov  decision  problem  with  uncertain  transition  probabilities. 

In  the  case  of  perfect  knowledge,  the  decision  that  maximizes  the  expected  rewan 
or  gain  is  chosen.  When  there  is  uncertainty  in  the  transition  probabilities,  the 
gains  become  random  variables.  Therefore,  Bayesian  decision  theory  is  applied  to 
this  problem.  A  loss  function  is  defined  and  an  a  priori  density  is  defined.  Bayes* 
formula  and  the  loss  function  are  used  to  compute  a  risk  for  each  decision.  The  deci¬ 
sion  that  minimises  the  risk  is  chosen. ,■  ' 

Conceptually  the  problem  is  solved  easily.  However,  transforming  a  density  over 
the  transition  probabilities  to  a  density  over  the  gains  is  a  difficult  problem.  The 
solution  of  this  problem  is  the  main  contribution  of  this  dissertation.  '  Using  these 
results  a  technique  is  derived  that  allows  a  straightforward  means  to  evaluate  the 
risks  for  each  decision.  Examples  are  presented  that  illustrate  the  technique. 

The  result  of  this  research  is  a  logical  method  to  compute  the  risks  associated 
with  each  decision  when  there  is  uncertainty  over  the  transition  probabilities.  The 
decision  maker  then  selects  the  decision  that  minimizes  the  risk. 
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PREFACE 


Decision  making  is  an  issue  constantly  before  cipher  the  developer  or 
user  of  U.S.  Air  Force  space,  missile,  tactical,  or  eth«r  systems.  Yst 
since  Howard's  significant  work  of  over  12  years  ago  there  has  been  little 
progress  in  this  area  on  the  important  methods  pioneered  by  Howard.  Because 
of  the  importance  of  this  area  to  applied  Air  Force  needs  the  numerous  results 
embodied  in  this  research  report  were  developed  and  illustrated  through 
numerous  examples  presented  herein. 

This  research  report  was  prepared  under  research  contracts  supported  by 
the  U.S.  Air  Force  Office  of  Scientific  Research  under  AFOSR  Grent  No.  72-2166, 
Design  of  Aerospace  Systems,  and  the  U.S.  Air  Force  Space  and  Missive  Systems 
Organization  under  Contract  No.  F04T01- V2-C-0273,  Advanced  Space  Guidance, 
and  this  report  constitutes  part  of  the  final  report  under  these  contracts. 

The  research  described  in  this  report  "Bayesian  Decision  'theory  Applied 
To  The  Finite  State  Markov  Decision  Problem,"  UCLA-ENG-7278,  by  William  Rosa> 
Osgood,  was  carried  out  under  the  direction  of  C.T.  Leondes  and  E.B.  Stear, 
Co-Principal  Investigators  in  the  Schools  of  Engineering  in  the  University 
of  California  at  Los  Angeles  and  Santa  Barbara,  respectively. 
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Ron  Howard  solved  the  Markov  decision  problem  with  perfect 
knowledge  of  all  the  transition  probabilities  and  rewards.  In  a 
practical  situation,  the  transition  probabilities  may  be  known 
exactly.  Therefore,  the  problem  this  research  attacks  is  the  Markov 
decision  problem  with  uncertain  transition  probabilities. 

In  the  case  of  perfect  knowledge,  the  decision  that  maximizes 
.  the  expected  reward  or  gain  is  chosen.  When  there  is  uncertainty 
in  the  transition  probabilities,  the  gains  become  random  variables. 
Therefore,  Bayesian  decision  theory  is  applied  to  this  problem.  A 
loss  function  is  defined  and  an  a  priori  density  is  defined.  Bayes' 
formula  and  the  loss  function  are  used  to  compute  a  risk  for  each 
decision.  The  decision  that  minimizes  the  risk  is  chosen. 

Conceptually  the  problem  is  solved  easily.  However,  trans¬ 
forming  a  density  over  the  transition  probabilities  to  a  density  over 
the  gains  is  a  difficult  problem.  The  solution  of  this  problem  is  the 
main  contribution  of  this  dissertation.  Using  these  results  a 
technique  is  derived  that  allows  a  straightforward  means  to  evaluate 
the  risks  for  each  decision.  Examples  are  presented  that  illustrate 
the  technique. 

The  result  of  this  research  is  a  logical  method  to  compute  the 
risks  associated  with  each  decision  when  there  is  uncertainty  over 
the  transition  probabilities.  The  decision  maker  then  selects  the 
decision  that  minimizes  the  risk, 
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Chapter  1 

PROBLEM  DEFINITION 


The  objective  of  this  research  is  to  apply  Bayesian  decision 
theory  to  the  finite  state  Markov  decision  problem  when  the  transi¬ 
tion  probabilities  are  unknown.  A  Markov  decision  problem  exists 
when  a  decision  maker  has  available  a  set  of  K  decisions.  Each 
decision  specifies  a  particular  Markov  chain  and  a  set  of  rewards. 

The  decision  maker  selects  the  decision  that  maximizes  his  gain  or 
expected  reward. 

Bayesian  decision  theory  can  be  used  when  there  Is  uncertainty 
over  the  transition  probabilities.  An  a  priori  density  is  specified 
over  the  probabilities  and  Bayes*  formula  is  used  to  compute  an  up¬ 
dated  posteriori  density  after  observations  are  recorded.  The  de¬ 
cision  maker  selects  the  decision  that  minimizes  his  expected  loss 
or  risk.  As  more  observations  are  recorded  the  posteriori  density 
concentrates  its  probability  mass  over  the  actual  values  of  the  transi¬ 
tion  probabilities  and  the  risk  minimizing  decision  maximizes  the 
gain. 

It  is  assumed  that  the  reader  has  knowledge  of  the  theory  of 
Markov  chains  (see  Reference  [7]  ).  The  following  notation  is  used 
in  this  dissertation.  There  are  N  states  in  the  Markov  chain  under 
consideration.  The  probability  of  making  a  transition  from  state  i 
to  state  j  is  denoted  by  p^.  The  NxN  transition  matrix  is  denoted 
by  P.  The  steady  state  probability  vector  is  denoted  by  tt. 
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A  knowledge  of  Bayesian  decision  theory  is  also  assumed  (See 
Reference  [3]  ).  The  following  notation  is  used.  The  states  of  nature 
is  denoted  by  0  .  The  observation  is  written  as  x^.  The  a  priori 
density  over  fi  is  ?o(w)  where  w  is  an  element  of  0.  The  posteriori 
density  is  denoted  by  and  is  computed  from  Bayes'  formula, 


5(w|3t  ) 


^Kyw) 

W)  ^><w)  dw 


where  l(x  I  w)  is  the  likelihood  function.  For  each  element  w  of  fl 
and  each  decision  k,  a  loss  L(k|w)  is  incurred.  The  risk  p(k)  is  the 
expected  loss. 

p(k).=  /  L(kjw)  §(w|x  )  dw 
q  -n 

Bayes  decision  k*  minimizes  the  risk, 


< 


C 


s 


1 


p(k*)  =  min  {  p(l)} 


1.1 


The  Markov  Decision  Problem 


J 


Once  a  Markov  chain  is  defined,  a  reward  structure  can  be 
placed  over  the  states.  Suppose  that  payoff  r^  is  received  when  state 

i  is  occupied.  The  N-vector  r=(rj . r^)  is  called  the  reward 

vector  associated  with  the  N-state  Markov  chain.  In  steady  state, 
the  expected  payoff  or  gain  is  denoted  by  &  where 


2 


N 

A=  £  r  n 
i=l  1  1 

=  <  r,  rr  > 


Now,  suppose  that  there  are  K  decisions  available  to  a  decision 
maker.  Each  decision  i,  1  =  1,...,  K,  specifies  a  unique  N-state 
Markov  chain  with  transition  matrix  P^.  The  corresponding  reward 
vector  is  denoted  by  r •*.  The  gain  under  decision  i  is  denoted  by  Aj 


where 


\  =  <£i»  L  > 

TT1  =  n1  Pl 


The  decision  maker  selects  the  decision  that  maximizes  his  expected 
payoff  or  gain.  In  otherwords,  he  will  select  decision  k*  such  that 


v = **»*  m 

K  l<i<K 


1.  2  The  Markov  Decision  Problem  with  Uncertainty 

The  Markov  decision  problem  defined  above  assumes  that  all 
transition  probabilities  and  rewards  are  known  with  certainty.  How¬ 
ever,  there  may  be  a  case  where  there  is  uncertainty  in  some  or  all 
of  the  transition  probabilities  and/or  rewards.  The  case  of  perfect 
information  was  developed  by  Ron  Howard  [7]  in  I960.  After  Howard 
completed  his  work  others  at  MIT  continued  to  investigate  this 
problem  with  uncertainties.  The  goal  of  these  works  should  have 
been  to  specify  the  "best"  decision  against  some  criterion.  However, 


/ 


their  work,  summarized  by  Martin  [9]  in  1967,  did  not  include  a 
means  to  specify  a  decision. 

This  research  applies  Bayesian  decision  theory  to  the  Markov 
decision  problem  so  that  a  decisior  can  be  specified  under  uncer¬ 
tainty.  The  transition  probabilities  are  taken  as  uncertain,  but  the 
rewards  are  assumed  known. 

Martin  showed  that  if  the  states  of  nature  are  the  set  of  all 
possible  transition  matrices,  and  the  matrix  beta  density  is  used  as 
the  a  priori  density,  then  Bayes  formula  transforms  observations  of 
state  transitions  into  a  posteriori  density  that  is  also  matrix  beta. 
Therefore,  the  states  of  nature  is  defined  as 

Q  =  {A?, 

N 

where  Aj  is  the  set  of  all  possible  NxN  transition  matrices  under 
decision  i.  An  element  of  Qis  denoted  by  w  where 

w=  (Pj,  P2 . Pg-) 

and  is  the  transition  matrix  under  decision  i. 

The  end  product  in  applying  Bayesian  decision  theory  is  a  risk 
associated  with  each  decision.  The  risk  is  defined  as  the  expected 
loss.  If  L(ijw)  associates  a  loss  to  each  decision  i  when  w*  0  is  the 
state  of  nature,  then  the  risk  p(i)  becomes 

P(i)  *  E  L(ijw) 

=  ^  Mil ^  ^(wlx^  dw 


0 


0 


) 


* 


3 


9 


9 


0 


4 


9 


’  So 


(- 
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where  §(wjx  )  is  the  posteriori  density  ever  fl.  The  decision  maker's 

’"U 

objective  is  to  maximize  his  gain,  therefore  the  loss  will  be  defined 
in  terms  of  the  gain.  Take  element  w  «  ft  where 

w  =  (Pj, PR) 

To  each  P.  the  steady  state  probability  vector  tt*  can  be  computed. 
Then,  for  each  decision  i  the  gain  Aj  is  given  by 

ZL  =  <  tt*‘,  _r*  > ' 

Suppose  decision  i  is  selected  resulting  in  a  payoff  of  A^  units  per 
transition  But,  if  there  is  a  Aj  j  =  1, , . .  ,K  such  that  Aj  >A.  then 
the  decision  maker  suffers  a  loss  of  at  least  Aj  -  A.  per  transition 
because  he  selected  decision  i  instead  of  decision  j.  The  loss  func¬ 
tion  L(i  !w)  is  defined  as  the  maximum  loss  and  is  given  by 

L(i|w)  =  max  {A.  -  A,} 
l<j<K  3 

Thus,  for  each  w«  ft,  L(i|w)  is  the  maximum  loss  per  transition  when 
decision  i  is  chosen. 

In  order  to  calculate  the  risk,  the  expression  max  {a.  -  A.} 

l<j<K  3  1 

must  be  written  in  terms  of  »s  (p  .  PK).  Gain  A.  is  written  as 

A.  =  < tt ^ ,  r_l  > 

The  problem  with  this  expressfcm  is  that  tt*  must  be  specified  as  an 
explicit  function  of  P^.  Since  the  steady  state  probability  vector  tt1 
is  uniquely  related  to  its  transition  matrix  P.  by  ti1  =  tt1P. 
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0 


the  existence  of  a  function  tv  =  o(p.)  has  some  intuitive  appeal. 

Now,  gain  &£  is  written  as 

=  <£(Pi),  /  > 

This  expression  is  substituted  to  get  the  desired  expression 

L(ijw)  =  max  /<a(P.),  £  >-  <o(P,),  £*>}  (1.1) 

l<j<K*  ~  J  1  9 

The  risk  function  becomes 


p(i)  *1^)^  >«  <o(Pi),  r.  >}5(Pj, ....  PK|^)* 


i  -  l,...,K  (1.2) 


The  decision  maker  continues  to  make  observations  x  until  he  is 

.  '•  -m 

satisfied  that  the  posteriori  density  has  concentrated  a  sufficient 
amount  of  probability  mass  over  the  actual  collection  of  transition 
matrices.  Then  the  risk  is  computed.  The  decision  that  minimizes 
the  risk  is  chosen. 

The  problem  is  to  evaluate  Equation  1. 2.  In  picking  the  risk 
minimizing  decision,  the  absolute  value  of  the  risk  is  not  important. 
That  is,  the  decision  the  results  in  the  smallest  risk,  whatever  its 
value,  is  chosen.  Suppose  that  risk  p(i)  is  used  as  a  reference.  The 
minimum  risk  p(k*)  will  satisfy, 


p(k*)  -  p(i)  <  o(k)  -  p(i)  k=l . K  (1.3) 

An  alternate  way  of  expressing  Equation  1.  3  is  to  say  that  k*  satisfies 
the  expression, 


O 


0 


0 


3 


9 


9 


« 


6 


) 


0 


p(k*)  -  p(i)  =  min  |p(k)  -  p(i)  | 

Now,  the  expression  p(i)-  p(k)  is  evaluated  by  substituting  Equation 

1.2, 

p(i)-p(k)  =  J  max  |<£(Pj),  £?  >-  <£(Pi).£.i  >}  §(w^)  dw 
'  mtX  {<-(Pj)’-  >  “  <2^pk^'-k 

i 

$M^)  dw 

=  /(  <£(Pk)»£k>-  <a(P.),ri  >)  5(w|x  )  dw 

-  <®(Pk).£k  >  SM^Idw  -  JT  <£(Pi),£.i>5(»lin)  dw 


*  E<‘klV  '  E(‘il2») 

Thus,  expression  p(i)  -  p(k*)  is  given  by 


(1.4) 


p(i)  -  p(k*)  =  min  jx^)  -  E(Aj  |j^)j  (1.  5) 

Define  function  Tl^k)  by 

nV)  =  -  E(Ak|xn)  (1.6) 

The  decision  k*  that  minimises  the  risk  minimizes  71i(k), 

T)l(k*)  =  min  {^(k)} 

Therefore,  the  problem  of  evaluating  the  risk  for  each  decision  is 
transformed  to  a  problem  of  evaluating  function  T)A(k)  for  each  decision 
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k= 1 f  •  •  •  »  K. 

The  ejected  value  of  the  gain  E{L,  j x  )  k=l, . , . ,  K  cannot  be 
computed  is  a  straightforward  manner.  The  procedure  required  is 
developed  by  first  looking  at  a  general  case  In  Chapter  2  where  there 
is  one  Markov  chain  (one  gain)  and  the  uncertainty  over  the  transition 
matrix  is  a  general  function  h^(P).  The  expected  value  of  the  gain 
is  E (A).  In  Chapter  3  the  density  over  the  transition  matrix  is  defined 
as  matrix  beta  and  the  observation  is  used  to  compute  the  posteriori 
density  $(P|x ).  The  expected  value  of  the  gain  conditioned  on  the 
observation  is  E(Ajx^).  In  Chapter  4  decisions  are  introduced.  The 
results  of  Chapter  3  are  used  to  compute  E(&jJxn)  for  each  decision 
k=l, . . . ,  K.  The  risk  minimising  decision  k*  is  selected  by  using 
Equation  1. 6.  An  example  is  presented  that  illustrates  the  procedure 
developed. 

Two  secondary  issues  are  discussed  in  Chapters  5  and  6.  in 
Chapter  5  the  possibility  of  specifying  the  uncertainty  over  the  steady 
state  probabilities  rather  than  the  transition  probabilities  is  explored. 
In  Chapter  6  the  theory  used  to  evaluate  the  expected  value  of  the 
gains  is  used  to  investigate  the  convergence  properties  of  two  state 
Markov  chains. 

The  results  of  this  research  are  summarised  in  Chapter  7  and 
several  topics  for  future  research  are  outlined. 
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Chapter  2 

COMPUTING  THE  EXPECTED  VALUE  OF  THE  GAIN 


The  objective  of  thia  Chapter  ia  to  compute  the  expected  value 
of  the  gain  E (t)  for  the  general  caae  where  there  ia  one  Markov  chain 
and  the  uncertainty  over  the  tranaition  matrix  ia  given  by  density 
hp(P).  The  gain  ia 

A  =  <n,  r  > 

=  <£(P)»  r  > 


where  the  existence  of  function  £(*)  is  hypothesized.  The  expected 
value  of  the  gain  could  be  expressed  as 

0 

E<4)  =  f  <o(P),  r  >  hp(P)  dP  (2.1) 

"AN 

N 

where  A  is  the  set  of  all  possible  NxN  transition  matrices.  Even 

though  Equation  2. 1  looks  simple  enough  there  is  a  serious  complies- 

tion.  That  is,  set  A  is  not  a  closed  and  convex  subset  of  Eucledian 

space.  This  complicates  the  integration  operation.  What  is  required 

is  some  transformation  that  allows  integration  in  Eucledian  space. 

The  transition  matrix  P  has  N  rows,  each  of  which  are  proba- 
tH  tH 

bility  vectors.  The  i  row  is  called  the  i  transition  vector  and  is 
denoted  by  £^.  Suppose  that  £j  is  known  with  certainty.  Clearly,  this 
knowledge  conveys  no  information  about  £^,  . . . ,  ^  j ,£^+  j » .  • .  £^.  In 
other  words,  each  row  of  P  should  be  probabilistically  independent. 
Therefore,  density  h^(P)  can  be  written  as, 


9 


0 


h^P)  *  hj(^)«  •  •  h^(£g|)  • 

where  h|f£j)  i»  »  density  function  over  the  ««t  n^.  th«  tot  of  a?!  N 
dimensional  probability  vectors,  Equation  2. 1  bacomaa 

E(&)  *  J  ^£(P)ip^(£j)>  •  d£y  *  *  (2.2) 

To  «ae  how  a  probability  denaity  h^(£|)  over  can  be  tranaformed 
to  a  density  over  a  closed  convex  set,  a  two  dimensional  case  is 
examined. 

Consider  a  two  dimensional  probability  vector  £  =  (pj,  p2).  The 
density  over  £  is  denoted  by  h^(pj ,  p2).  Since  Pj  and  p2  are  con¬ 
strained  to  satisfy  the  conditions, 

0  Pi*  Pz  >  0 
ii)  ?i  +  ?z  s  1 

the  probability  mass  must  lie  over  the  line  pj  +  p2  =  1  in  the  positive 

quatrant  as  drawn  in  Figure  2. 1.  A  simpler  way  to  characterize  the 

density  would  be  to  define  it  as  a  function  of  a  single  variable  s  where 

s  *  0  corresponds  to  £  =(1,  0)  and  s  =  JZ  corresponds  to  £  =(0,1),  for 

example.  In  this  case  the  density  over  £  is  denoted  by  f^(p)  and  is 

drawn  in  Figure  2. 2.  Notice  that  density  h^(pj,  p2)  is  transformed  to 

density  f  (s)  where  s  belongs  to  the  closed  convex  set 

El 

{sa  E1|0<s<(y2} 

Vector  £  and  scalar  s  are  related  through  the  equation 

£  =  (1,0)  +  ^  (-1,1) 


O’ 


0 


0 


10 


Now  consider  a  two  -  di  man  clonal  transition  matrix  P.  The 
denaity  over  the  first  row  is  denoted  by  fj(Sj),  end  the  density  over 
the  second  row  is  denoted  by  ^2^  *2' ‘  ®ince  rows  of  matrix  P  are 
probabilistically  independent,  the  density  over  matrix  P,  denoted  by 
fp,  is  given  by 


where 


*pW  *  W  {z(Bz) 


1  -  (*1*  *2) 


A  density  fp(t)  is  drawn  in  Figure  2. 3.  Notice  again  that  the  new 

2 

density  fp(t)  is  defined  over  a  closed  convex  subset  of  E  . 

This  notion  of  expressing  a  two  dimenstional  probability  vector 

as  %  scalar  is  extended  to  the  N  state  case  in  this  Chapter.  Density 

a.(£j)  is  easily  transformed  to  density  ^(j^)  where  £.  is  a  member  of 

N- 1 

set  which  is  a  subset  of  B  .  Density  hp(P)  is  transformed  to 
density  fp(t)  where 


^p(t)  = 

1  =  (£1 »  ....  £N) 


,N 


and  £  is  a  member  of  set  -  T^x. . .  xT^  (N  times).  It  will  be  shown 
that  Eq«'4tion  2. 2  can  be  written  as 


E(&)  =  J  <  w(t_)U,£>  fp(£)  dt_  +  _r: 


(2.3) 


,N 

•N 


This  expression  can  be  evaluated  either  analytically  for  special  Markov 
chains  or  numerically  on  the  computer. 
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In  order  to  evaluate  Equation  2. 3,  function  w(£)  must'be  derived, 
density  fp(t_)  has  to  be  transformed  from  hp(P),  and  matrix  U  and 
vector  have  to  be  defined;  The  steps  taken  to  evaluation  Equation 
2. 3  are, 

1)  Sets  Tn  and  tJJ  are  defined. 

2)  The  transformations  that  take  probability  vector  £  in  IIN 
to  vector  £  in  f  N‘, ; and  transition  matrix  P  in  AN  to  vector 
1  in  Tjj  are  defined;  Matrix  U  and  vector  rr^  are  defined. 

3)  Functions  w(tj  and  £{P)  are  derived. 

4)  Given  a  transition /matrix  P  and  density  hp(P),  the  density 
h^ijT)  over  the  steady  state  probability  vector  is  sought. 

•»  -JThe  transformed I  density  ffejjis  computed  from  density 
fp(P)*  Although  this  development  is  not  used  in  evaluating 
Equation  2, 3.,it  is  included  because  of  its  fundamental 
importance.  Equation  2. 3  could  also  be  written  as 

E(A)  =  f  < sU.r  >  fn(s)  dii  +  <1^,  r  > 

TN 

5)  Equation  2. 3  is  evaluated. 

2. 1  Sets  Tjj  and  T^j 

All  vectors  in  IIj^  have  the  following  property.  Given  any  two 
probability  vectors  £j  and  ^  the  difference  vector  £^-£2  lies  in 
the  hyperplane  H  defined  by 

H  =|x«EN  |<x,e>=  0} 
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I 


I 


I 


I 


t 

f 


K 

I 

i 

| 

L 


< 


Conceptually,  probability  vector*  “touch"  hyperplane  H  a*  shown  in 

the  three  dimensional  example  drawn  in  Figure  2.4.  Since  every 

probability  vector  can  be  uniquely  represented  by  a  point  in  H,  the 

N-l 

set  of  all  these  points  is  a  closed  convex  subset  of  £  .  H  is  an 

N-l  dimensional  hyperplane,  therefore  a  basis  xcan  be  constructed 
in  H  where 

*  =  l—i* —2*  *  *  *  ’-!^N-l| 


Then  any  vector  «  H  is  given  by 


Z-s' 


*i  4i 


Vector  z  with  respect  to  basis  x  i*  given  by  the  N-l  dimensional 
vector  s  where 


s  =  (Sj,  s2, 


N-l 


) 


Now,  set  x  together  with  any  vector  in  11^  is  clearly  a  basis  of  E 


N 


Since  {x.n^j}!®  *  basis  in  EN,  every  vector  £  c  "n  can  be 
written  as 

N-l 

E  =  'S  +  •„!!. 


Vector  £  -  s^  tt^  lies  in  H  so 

0  =  <  £  -  sN  no,  e  > 

=  <E’«  >  ’  8n<  2o^> 


=  1-8 


N 


15 


Therefore,  =  1  and  vector  £  with  respect  to  basis  |x>no}  is  given 
by 

(81,82,....,SN  r  1) 

th  "  1  ■’  v  ' 1  1 

Since  the  N  component  is  always  unity,  a  probability  vector  £  will 

t 

v 

have  a  unique  representation  Ih  basis  x  given  by  the  N-l  dimensional 

\  >.  v 

vector  £  where 

-  =  (8i» 1  *  ,v-N-l) 


The  set  of  all  probability  vectors  is  given  by  n^.  To  each 

I  (..'iJyT.  1  , 

element  £  of  II M  there  is  a-iuhique»s.in  s’et^E.  .  The  collection  of 

'  W  "I  V’5?  '  \  '  ■ 


is  representing  all  £  e  11^  is  denoted  by  T^  where 

, ....  Ttdi‘EN:1|iu=r,-o^o^} 


(2.4) 


N 

An  NxN  transition  matrix  P  can  be  represented  by  the  N 
vector  in  E^x. . .  xE^, 

* 

{Rl»E2*  *  *  *  ’Rn^ 

th 

where  £.  is  the  i  row  of 

it  has  a  unique  representation  s^  in  Therefore,  matrix  P  has  a 

unique  representation  in  set  T^x. .  .xTjj  given  by  (£j*£2»  •  •  • 

N 

The, set  Tj^x. . .  *1’^  is  denbted  by  T^  and  the  N(N-l)  dimensional 
vector  (£j, . . ,  ,  i^),is  denoted  by  t >t 

2. 2  Transformations  n^-*T^  and  A^-*T^|| 

In  this  section  the  transformation  that  takes  probability  vectors 
to  their  representation  in  T^  is  derived.  Also,  the  transformation 


matrix  P.  Since  each  £.  is  a  vector  in  , 
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that  takes  transition  matrices  in  to  their  representation  in  is 
derived. 

First,  the  transformation  from  to  is  specified.  Vector’ 
£  -  tto  lies  in  H  if  £  *11^  and  is  given  by 

N-l 

£  ‘(Hi 
i=l 

The  reciprocal  basis  Y  =  ^  l}  ^a8^8  v®ctors  X  is 

used  to  generate  the  expression 


<E-2o-2j>  =  <i;  ‘A’  -j> 

i=l 


=  s. 
3 


j=l, ...» N-l 


Therefore,  the  representation  £  of  probability  vector  £  satisfies 

where  the  columns  of  matrix  V  are  the  reciprocal  basis  vectors. 

Take  any  vector  £  s  T^.  Vector  £  =  £U  lies  in  H  so  there  is 
a  vector  £  t  such  that  £  -  =  z  or 

£  =  TT  +  £U 


N  N 

Next,  the  transformation  from  A  to  T^  is  specified.  Take 
N 

any  matrix  P  «  A  .  The  rows  of  P  lie  in  1^.  Vector  £^  is  the  unique 

th 

representation  of  the  i  row  £^  where 

8i  a  <£i  "A>)v  i=l, . . . ,  N 


This  expression  specifies  vector  t_  =  (£j*£2 . the  representa- 

N 

tion  of  matrix  P  in  set  T^. 

N 

Now,  take  any  vector  t^  «  Tj^  .  Vector  z. ^  lies  in  H. 

th 

There  exists  a  vector  £.  « 11^  such  that  £^  =  t^  +  £jU.  If  the  i  row 

N 

of  an  NxN  matrix  P  is  taken  as  j^,  then  matrix  P  lies  in  A  . 

2.  3  Specifying  a  Basis  x 

In  this  Section  a  particular  basis  xis  specified.  This  basis  is 

used  in  many  of  the  future  developments  and  examples. 

Let  e.  be  an  N  dimensional  vector  with  all  zero's  except  for 
th 

the  i  component  which  is  one.  Since  ttq  can  be  any  vector  in  11^ 

define  tt  as 
— o 


tt  =  e. 
— o  — 1 


(2.5) 


Define  the  basis  x  by 


-i  s  '  -o)  i^l, . . . ,  N- 1 


(2.6) 


In  order  that  the  vectors  u^  defined  above  are  basis  vectors  of 
H,  each  must  lies  in  H  and  the  collection  x  must  be  linearly  in¬ 
dependent.  Vector  Uj  lies  in  H  if  <u^,  e  >  =  0.  Substituting  for  u^ 


gives 


<u, , e>=— —  (< e,  e. , ,  >  -  <rr  ,e>) 

“  JL  ~  ~i+1  "°  “ 

=  -zr  (i-D 


/ 


0 


To  prove  linear  independence,  take  any  vector  £  e  Ew "  ,  If 
X  ifi  a  linearly  independent  collection  then  cU  =  0  implies  that  c  =  0 . 
Vector  cU  is  given  by 


,  fN-1  N-l 

cU  - -  V*  c.e. , ,  -  ti  c. 

-  K  Lj  1-X+l  o  7.  1 

[  i=l  i  =  l 


-  (0,  c  j,  C2 . ^N-  0, . . . ,  0)| 

’£  ,-<^>’crc2 . cn-i> 


Clearly  £U  =  0  implies  that  £  =  ()  and  hence  x  is  a  linearly  independent 
collection. 


The  reciprocal  basis  Y  satisfies 

<v.,  u.  >  =  6..  i 

-J-i  ij 


•  •  •  1 N- 1 


It  can  be  easily  shown  that  a  reciprocal  basis  to  x  i®  given  by 


-i  =  ^  %+l 


i=l, . ...  N-l 


(2.7) 


Example  2, 1:  Consider  a  three  state  Markov  chain.  A  basis 
for  H,  defined  in  Expression  2.6,  is  given  by 


«(:)■  5-0 


The  reciprocal  basis,  defined  in  Expression  2.7,  is  given  by 


f 


i 
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t 


|C 


0 


o 


y  = 


JT 


CH) 


Vectors  ttq,  Oj,  v^,  and  are  shown  in  Figure  2.5. 


Set  in  this  basis  is  given  by, 


■N  ={l  *  eN‘‘  I  E  «i<ii+1-So>+^l  ‘  "n| 

=  |i  *  eN'‘  1  +  E'i-i+l*  *  "n} 

:N  =  |s  «  EN_1  i  -  <£!«>).  *j/^ . s^A/Z)  e  nN| 


Since  the  above  expression  contains  a  probability  vector,  the  com¬ 
ponents  must  satisfy  the  conditions 

i)  0  <  —  (»/Z  -  <*.£  >)  <  1 


or 


ii)  0  <  s ^/JZ  <  1 


i  =  1 . N-l 


Condition  i)  after  some  manipulation  becomes 


0  <  <e,  s >  <  JZ 


Therefore,  set  is  defined  as 


TN  =  |s  «  EN  1  I  0  <<e,  »><  0  <  s.  <  i=l, . .  .N-lJ 


Sets  for  N  =  2,  3,  4  are  drawn  in  Figure  2.  6. 
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2.4  The  Function*  q;AN-*nrj  and  w:T^~>T^ 

The  steady  state  probability  vector  n  can  be  implicitly  computed 
from  matrix  P  by  using  Z 'transforms  or  by  computing  the  eigen 
vector  of  matrix  P  corresponding  to  the  unity  eigenvalue.  In  this 
section  an  explicit  expression  tt  =  o(P)  is  derived  and  the  desired 
expression  £  *  w(t)  is  specified. 

N 

2. 4. 1  The  Function  o:A 

A  Markov  chain  with  N  states  and  «*rgodic  transition  matrix  P 
is  given.  The  probability  distribution  over  the  states  at  time  is 
given  by  the  N- vector  TT(n).  At  time  t  j,  after  one  transition,  the 
probability  distribution  over  the  state  is 


n(n+l)  *  Tt(n)  P 


(2.8) 


Since  P  is  ergodic  njn)  converges  to  vector  tt.  the  steady  state  proba¬ 
bility  vector,  »«  n-*«, 

Define  NxN  matrix  P  by 

o 


(2.9) 


The  rows  of  Pq  are  the  vector  .  Given  any  vector  £  «  1^  the 

following  relationship  holds, 

£P  ss  TT 

O  — O 


Therefore,  tt^  -  £  Pq  «  0  for  all  £  c  11^. 


i 


The  expression  tt^  -  n(u)  is  added  to  Equation  2. 8  to  get 
n(n+l)  s  Tr(n)  P  +  -  n(n)  PQ 

Rearranging  gives 

E(n+1)  =  E(n)  <P  -  P0)  +  n, 

The  asymptotic  behavior  of  this  expression  is 

a  *  B(P  -  p0)  +  a„ 

Solving  for  tt  gives 

E^I^d  -  P  +  P0)‘l  (2.10) 

Equation  2. 10  specifies  the  mapping  o:  P-*.tt. 

The  following  Theorem  proves  the  existence  of  (I  -  P  +  Po)_I 
when  P  is  ergodic. 

Theorem  2. 1 ;  If  P  is  ergodic  then  (I  -  P  +  Pq)'1  exists. 

Proof:  The  inverse  of  matrix  (I  -  P  +  PQ)  exists  if  the  null 

N 

space,  denoted  by  N(I  -  P  +  PQ),  is  void.  Take  any  x  «  E  . 
Vector  x  lies  in  the  null  space  if 

x  (I  -  P  +  PQ)  *  C 

or  equivalently 


X  =  X  P  -  u  PQ 


Notice  that  <£j  -  n^,  e>  .»  0  for  1«1, . . .  ,N,  so  -  tj^  lie* 
in  the  hyperplane  H.  A  hyperplane  it  a  linear  subspace,  so 
any  linear  combination  of  vectors  In  H  will  alao  lie  In  H. 
Therefore  the  nail  apace  is  a  aubaet  of  H. 

Now,  let  x  be  any  vector  in  H.  Then  x  is  written  as 


x  a  or  U 

If  xs  N(I  -  P  +  P0)  then. 

«U(I-P  +  Po)*0 
or 

v  U  -  a  UP  +  «UP0  =  0  (2.11) 

Matrix  UPQ  is  evaluated  in  the  following  manner.  The  ij^1 
element  is  given  by 

TToj<Hi*®>  i*  j— 1  *  •  •  • .  N 

Since  u,  lies  in  H,  <  u,,  e  >  =  0.  Therefore,  UP  is  the  zero 
matrix.  Using  this  result,  Equation  2. 11  becomes 


a 


a 


tff' 
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x  =  xP  (2.12) 

Equation  2. 12  implies  that  vector  x  c  H  is  an  eigen  vector 
associated  with  the  unity  eigen  value.  However,  when  P  is 
ergo  die,  the  steady  state  probability  vector  is  the  unique  eigen 
vector  associated  with  the  unity  eigen  value  [l] .  Therefore, 
x  is  the  zero  vector  and  the  null  space  of  matrix  (I  -  P  +  Pq) 
is  void. 

QED 


Example  2.  V  Consider  a  two  state  Markov  chain  with  transi 


tion  matrix 


1/2 

1/2 

1/4 

3/4 

The  eigen  vector  z  associated  with  the  unity  eigen  value  satisfies 
z  =  zP,  Therefore, 


z  =  (Zj,  z2) 

=  (1/2  Zj  +  1/4  z2,  1/2  Zj  +  3/4  z2) 


Since  £  is  a  probability  vector 

‘l  +  *2'3ll  =  1 


anc 

a  =(1/3,  2/3) 
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2. 4, 2  The  Rmction  w;t  .♦  a 

The  equation  s.  *  w(t_)  can  be  derived  from  equation  rr  =  o(P), 
using  the  traneformationa  and  P-*t_.  However,  the  equation 
a  a  w(£)  will  be  derived  independent  of  it  =  o(P). 

Vector  n(n)  can  be  expressed  as 
£(n)  =  Ho  +  U 

Substituting  this  expression  into  Equation  2.  8  gives 

s(n+l)  U  =  s(n)  UP  +  JTjP  -  (2. 13) 

Using  the  identity  UV  =  I,  Equation  2. 13  is  transformed  to 

s(n+l)  a  sjn)  UPV  +  tt^PV  -  V)  (2. 14) 

The  asymptotic  behavior  of  Equation  2, 14  is  given  by 
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(2.15) 


8  =  5  UPV  +  Zq  (PV  -  V) 

Matrix  PV  can  be  developed  further.  The  i**1  row  of  matrix  P 
is  given  by 

Ei  =  ^  +  li  u 

Therefore,  matrix  P  satisfies 


P  =  PQ  +  SU 


(2.16) 


where  matrix  S  is  defined  by 


S  = 


r±l 


l!n- 


Post  multiplying  Expression  2. 16  by  V  gives 


PV  =  P  V  +  S 
o 


(2.  17) 


since  UV  =  I.  Substitute  Equation  2. 17  into  Equation  2. 15  to  get 


•  =  s  UP  V  +  s  US  +  n  (P  V  -  V  +  S) 
—  —  o  —  “O'  o 


Using  the  identity  UP„  =  0  and  the  fact  that  tt  P  =  n  , 
,  O  *”0  o  — o 


S  =  S  US  +  TT  S 
—  —  — o 


Solving  for  s  gives 


£  “  Ho  S(I  -  US) 


1 


(2. 18) 


Symbolically,  Equation  2. 18  is  written  as 
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8  =  W(t) 

where 

a 

t=(il'£2 . £n> 

The  following  Theorem  proves  the  existence  of  (I  -  US)’  ^  when 
P  i*  ergodic. 

Theorem  2.2:  If  matrix  P  is  ergodic,  then  (I  -  US)"*  exists. 
Proof:  Matrix  I  -  US  has  an  inverse  if  it  has  full  rank.  Ex¬ 
pression  I  -  US  can  be  written  as 


I  -  US  a  I  -  UPV 

=  UV  -  UPV  +  UP  V 
o 

=  U(I  -  P  +  Po)V 

Matrix  I  -  P  +  F  has  rank  N  when  P  is  ergodic.  Matrices 
U  and  V  have  rank  N-l.  Therefore,  matrix  U(I  -  P  +  PQ)V 
is  an  N-lxN-1  matrix  with  rank  N-l. 

QED 

Example  2.  3:  Consider  a  two  state  Markov  chain.  Take  rrQ 
and  U  as  defined  previously 


TT 

“O 


u 


(1.0) 

(-1.1) 


Equation  2. 18  becomes 


s  =(1,0) 


1) 


-1 


and 


t 


i 


§ 

f 

% 

jjj 


•a 


$ 


30 


v  "  <•'  »■*/,  ■  « 

2.5  Computing  f^s)  from  f^(t) 

N 

The  probability  distribution  over  set  A  was  given  as 

hp(p>  =  h1te1)---hNteN) 

Density  h^(P)  can  be  easilyAtransformed  to  density  fp(t).  The  pro- 

v  .  %  ,  if  ,  I 

cedure  will  be  outlined  inthe  next  Chapter.  Density  fp(t_)  is  trans¬ 
formed  to  density  f^aj.over  the  set  T^  using  the  equation  £  =  w(£) 
in  this  section. 

If  the  function  w(».)  was  one-to-one  and  continuous  then  the 

density  fn(£)  would  be  simply  fp(w**  (a)).  .However,  the  function 

-1  N 

w(- )  is  many-to-one  and  the  inverse  set  w  (s)  is  dense  in  T...  There- 
fore,  density  f^(a)  is  computed  by  first  constructing  the  distribution 
function  over  set  T^  and  then  taking  the  derivative  with  respect  to 
vectors  in  V. 

The  distribution  function,  Fn(a ),  1b  the  probability  mass  over 
the  set 

The  inverse  set  w"^(A)  is  given  by 

w’1(A)={t  clJJ  jw(t)<a| 


The  distribution  function  can  be  written  as 


Fn(a)  =  J  ys)  ds 

A 

or 

F„ (a)  =  f  £p(i)  dt 

w-'(A) 

The  density  y  s)  is  computed  by  taking  the  derivative  of  (a)  at 

a  =  s  , 


aN‘J  Fn(a) 

”  da  j . . .  iJa^  j 


-1, 


ass 


(2.20) 


Consider  the  inverse  set  w"  (A).  The  expression  w(t_)  <  a  can 
be  written  as 

ttqS  (I  -  USf 1  <  a 

using  Equation  2. 18.  Rearranging  gives 


(t^  +  aU)S  <  a 

Let  vector  +  aU  be  represented  by  vector  where 

Z-±i*  . aN.,, 

if  the  basis  x  defined  previously  is  used.  Notice  that  when  a  c 
then  %  «  nN,  Now,  the  expression  is  written  as 

2s  =  iBa 

where 
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r: 
•  w 


B»  = 


-  ca^e  > 

,  T 

7? 

XN-1 

aj/7^ 

*N-1 

• 

• 

• 

aN-l  /  ^ 

‘n-i 

and  Ij^  j  is  the  (N-l)x(N-l)  identity  matrix.  The  notation  Ba  is 
employed  to  indicate  that  matrix  B&  is  a  function  of  vector  a. 

Using  the  above  development,  the  inverse  set  becomes 

w'V)  ={t.«  |  lBa<aj  (2.21) 

Once  vector  a  is  selected,  the  above  expression  specifies  a  well 
N 

defined  subset  of  T^.  In  the  N  state  case  (N>2)  both  integration  and 
N 

differentiation  in  can  be  carried  out  on  the  computer. 

Example  2. 4;  Consider  a  two  state  Markov  chain  with  reward 
vector  r.  The  density  fp(t)  =  f^SjJfgfSg)  is  assumed  known.  Set  A 
is  given  by 

a  =  (»*t2  !•<*} 


and  set  w"  (A)  is 


w_1(A)  =  ji 


Expression  tB  becomes 


— B.  *  (* 


rs2>  ^ 


«T2  liBa<- 


s/g-  a 

7? 

a 

7? 


/ 

/ 


Sj(^2  -  a)  +  »2* 

=  n 

Therefore,  set  w"*(A)  is 

w**(  A)  =  |t_  c  -  a)  +  *2*  <  ft 

and  is  drawn  in  Figure  2. 7. 

Density  fn(a)  is  computed  for  two  cases:  ^  <  a  <  and  4 

0<a<is/£/2 

« 


« 


I 


I 


< 


I 


i 


( 


S5S !Z0ffi£2a3iS! 


A 


CASE  I :  ^T/2<a<7J 


au^T+aSj-f/ZBj 


v2  /*  a 

'„<*>  =  y  J  fjOj)  y.2)  d.jd.- 


V°  *2=0 


using  the  fact  that  fp(s^,  82)  factors  into  f^s^^^).  Continuing,  1 

a^+s^a-T^) 


Fn(A)  =  /  W  I  W«'2  d8l 


s2=° 


V  — 

=  J  fjtSj)  g(sra)  dSj 


Computing  f^a)  from  Equation  2. 20  gives 


V  ~ 

Ma)a  £  f  V»i)g(va)d,i 
•r° 

■Ji 

=  /  g(»ra)d*i 

§i=o 

The  function  g(Sj,a)  is  evaluated  using  Leibenitz's  rule, 
>  /a«/Z+s.(7?-a)\  »  raV2f+s.(,/Jr-a)l 

K«’v"-h\—r - )£[ — r - J 

^a7?+ »!(*/?- a ^s^^ 


2  v  r 


(2.22) 


(2.23) 
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< 


c 

i 

l 


0 


t 

I 

I 

I 


Q 


C 


1 


c 


c 


© 


l 
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Substituting  Equation  2. 23  into  Equation  2. 22  gives 


ya) 


=  /  Wh{— - )t~ 

*r° 


ds. 


CASE  II:  0<a<^/2 


Finally. 


To  summarize, 


< 


f2(s2)fl 

¥81)£2 


JZ 

5  8  <T 
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(2.24) 


(2.25) 


I 


9 


2.6  Evaluating  E(A) 

The  gain  is  given  by 


A  =  <  tt,  r  > 


(2. 26) 


Probability  vector  tt  can  be  written  at 


TT  s  TT  +  «U 


TT  =  TT  +  W(t  )  U 
—  — O  —  — 

Substituting  into  Equation  2.  26  gives 

A  *  <TJ0>  r  >  +  <  w(t_)U , £  > 

Since  density  hp(P)  is  assumed  known,  density  fp(t_)  is  known. 
Therefore,  E(A)  becomes, 

E(A)  =  /  <w(t_)U,  r  >  fp(t)  dt_  +  <2^,  r  >  (2. 27) 

aN 


where 


w(t)  *  n  S(I  -  US)’ 


Example  2. 5:  Consider  a  two  state  Markov  chain  with  known 
density  ip{t)  =  f^Sj^Sj).  The  reward  vector  is  r.  The  expected 
value  of  the  gain  using  Equation  2.  27  and  Equation  2. 19  is 

r 

E(A)  =  <u,  r  >  /  - - fi(s1)f2(s2)ds1ds2+  r> 

J2  72+srs2 


,  a*  “ 


/ 


2.7  Ergodicity 

The  concept  of  ergodicity  in  the  set  is  developed  next.  If 
a  transition  matrix  has  all  non-sero  entries,  then  it  is  ergodic  [9]. 
Since  the  equation  n  =  o(P)  holds  for  all  ergodic  matrices  P,  the 
equation  s>  =  w(t_)  holds  for  all  vectors  t_  that  represent  ergodic 
matrices.  The  subset  of  T^  that  corresponds  to  non-ergodic  matrices 
must  be  identified.  If  this  subset  is  assigned  non-sero  probability 
mass  by  the  matrix  beta  density,  the  technique  of  computing  f  (js) 
from  fp(t_)  using  s_  =  w(t_)  may  not  be  valid. 

Take  a  vector  £  c  .  Probability  vector  £  can  be  thought  of 
as  a  row  of  matrix  P  and  is  written  as 


R  =  IL>  +  £U 


*  (1, 0, . . . ,  0)  +  •  < 


-110 


•1  0  1  0  ...0 


-1  0 


=  (If  Of  •  •  •  f  0)  +  (•  ®|f  •  •  •  9 

•fl 

-  (i  <••>  •, . . 


If  p.  =  0  then  either  =0  or  <£,  e>  =  «/2 ,  in  the  case  where  i=l . 

N 

From  the  definition  of  set  Tj^  in  set  x.  these  two  conditions  place 

N 

the  resulting  vector  £  on  the  boundary  of  T^. 

In  the  next  Chapter  the  matrix  beta  density  is  defined.  This 
density  assigns  all  its  probability  mass  to  transition  matrices  that 
have  all  non-zero  entries.  Therefore,  the  matrix  beta  density 


/ 


assigns  zero  probability  to  the  boundary  of  T^.  Recall  that  equation 
i  ~  does  not  hold  on  the  boundary  of  Tj^.  Therefore,  the  inte¬ 
gral  of  ur(t)  times  the  matrix  beta  density  over  exists.  Further, 
the  method  of  computing  f^jt)  from  f^(t}  using  •  =  w(t_)  is  valid 
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Chapter  3 

COMPUTING  THE  EXPECTED  VALUE  OF  THE  GAIN 
CONDITIONED  ON  OBSERVATIONS 
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In  the  laet  Chapter  the  uncertainty  over  the  transition  proba¬ 
bilities  was  characterised  by  a  general  density  function  hp(P).  In 
this  Chapter  density  hp(P)  is  specified  as  matrix  beta.  Observations 
are  recorded  from  the  states  of  nature  0  .  The  posteriori  density 
is  computed  using  Bayes*  formula.  The  expected  value  of  the  gain 
conditioned  on  the  observations  is  computed  from  the  posteriori 
density. 


3. 1  The  Matrix  Beta  Density 

The  transition  probabilities  P  =  [p^]  are  said  to  have  the 
matrix  beta  density  with  "parameter"  M  *  [m^]  if  P  has  the  joint 
density 


k<M)  ki^V^  ^  P*  ^ 

3=1 

0  ,  elsewhere 


The  normalizing  constant  k(M)  is  given  by 


N  r(M.) 

k(M)  =  n  -rj - — 

isl  W 

1  1  1  r(m..) 

3=i  } 

where 
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I 


/ 


= 


?im‘i 


i=l,  ...,N 


The  "parameter"  M  is  an  NxN  matrix  auch  that  m^  >  0  i,  j=l, . . . ,  N. 

Notice  that  the  row*  of  P  are  independent  random  probability 
vectors  when  P  has  the  matrix  beta  density.  Therefore,  the  density 
function  can  be  factored  as 


£„<P|M|  >  in  hi(El|m.) 


where  m^  is  the  1  row  of  matrix  M.  The  probability  density 
hi^lnji)  is  given  by 

N  m.,-1 

-  s  n  (Py)  J 


(3.1) 


where 


k(mj)  * 


r(Mt) 

i r - 

n  n-tijj) 


Density  hjfgj  |m.)  U  the  density  over  the  1th  row  of  matrix  P,  and  X. 
called  the  vector  beta  density  with  "parameter"  m^  . 

The  following  statistics  on  p^  are  derived  using  the  density 
function  in  Expression  3. 1: 
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m. . 

»  E<V=?ii=< 


ii)  Var(p..)  =  -¥-4 . a 


(3.2) 


iii)  ,Cov(pij/pin)  = 


-  m,.  m, 

ij  in 


Mf(Mi+l) 


3.2  The  A  Priori  Density 

The  states  o.f  nature  0  in  the  case  of  a  single  Markov  chain  is 

N  N 

just  the  set  A"  .  The  a  priori  density  over  A  is  specified  as  the 

matrix  beta  density  with  "parameter"  MV 

«»<•> » &<»!“>  • 


Given  this  definition,  the  problem  becomes  how  to  specify  the  ele¬ 
ments  of  matrix  M  so  that  the  a  priori  density  reflects,  in  some  way, 

’  *  ,v  '  f  '  ’  Jl5  *•  *  ■  f 

the  a  priori  knowledge  of  the  transition  probabilities.  Since  the 
transition  vectors  are  probabilistically  independent,  the  rows  of 

M  can  be  selected  independently. 

/  ■  ‘  * 

For  example,  suppose  the  value  of  m.  is  specified  as 


mil  ~  mi2  =  •  •  *  =  =  m 


The  expected  value  of  p^  is  given  by  condition  i)  of  Equation  3.  2, 


( 


43 


o 


E(pij*  ?  5^ 


m 

Nm 

1 

w 


j=l, . . . ,  N 


The  magnitude  of  m  doee  not  affect  the  expected  value.  However, 
the  magnitude  doee  affect  theyariance  and  covariance.  Substituting 
my  =  m  into  condition  ii)  and  iii)  gives 


N-l 


Var(p..)  *  — - 

.  3  N  (Nm+1) 


j— 1*  •  ♦  •  »N 


CovlPy  p^)  •  -j 


-1 


N  (Nm+1) 


j=l, . . . » N 


As  m  increases  var(py)  and  covfpyp^)  govto  aero. 

The  example  illustrates  the  fact  that  the  relative  proportions 
of  the  my  will  specify  the  expected  value  of  the  py  and  the  magnitude 
will  specify  the  variance  and?  covariance.  In  otherwords,  if  is  the 
expected  value  of  then  m^  will  be  given  by  m^  =  a  jj>^  where  a  >  0, 
The  magnitude  of  a  specifies  the  variance. and  covariance. 

Example  3. 1 :  Consider  a  two  state  Markov  chain.  Let 


0 


0 


o 


$ 


0 


0 


mil  *  mi2  =  k 


i=l,  2 


where  k  is  a  positive  integer.  The  expected  value  of  is  given  by 


E(Py)  =  1/2 


i,  3=1.2 
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The  variance  and  covariance  of  are  given  by 

Vftr(Pij>  s  ifZE+I) 

C°v(Pij  Pto)  =  ipRJ) 


<n; 


The  values  of  var(p^)  and  cov(p.^  p^)  for  several  values  of  k  are 
listed  jin  Table!  3.1. 


:  t  * 

t 


For  each  vaiub  df  k'the  densities  h.(£^|nij)  i=l,  2  are  given  by 


N 


hi<£i  l0i)i  =  k(0i>  (Pjj) 
k(m{)  =  ^ 


k-1 


m 


Density  h^jjjm^)  can  be  drawn  as  in  Figure  2. 1,  or  can  be  trans¬ 
formed  to  a  density  f^(s^|m^)  as  drawn  in  Figure  2. 2.  The  details  on 
how  f^(sjm^)  is  computed  is  covered  in  Section  3.4.  However,  as 
seen  from  Figures  2. 1  and  2. 2  density  f^(sjm^)  has  the  same  shape 
as  density  h^(£^|m^)  in  the  two  state  case,  and  therefore  should  cause 
no  confusion.  The  transformed  density  fj(sjm.)  is  drawn  in  Figure 
3. 1  for  several  values  of  k.  The  density  is  seen  to  concentrate  more 
of  its  probability  mass  near  s=0. 707  as  k  increases.  This  is  equiva¬ 
lent  to  saying  that  the  covariance  and  variance  are  going  to  zero  as 
k  increases. 

The  following  method  is  proposed  to  choose  matrix  M  in  the 


a  priori  density. 


0 


Table  3. 1 


k 

Var(Pi.) 

eo'-lPjj  Pln> 

1 

0.05 

-0.  05 

2 

0.  028 

•  0.028 

5 

0.012 

•  0.012 

15 

0.  004 

-0.  004 

0 


O 


Figure  3.1  A  Priori  Den»ity 


1 


1.  If  the  transition  probabilities  are  completely  unknown 
set 

=  1  i,  j=l, . . .  ,N 

2.  If  the  knowledge  of  the  transition  probabilities  is 
more  precise,  select  vectors  m^  i=l, , . . ,  N  such 
that  the  resulting  expected  values  E(Py)  lie  in  the 
expected  range.  Select  the  magnitude  to  be  pro¬ 
portional,  in  some  way,  to  this  knowledge. 
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0 


a 


y 
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3.  3  The  Posteriori  Penalty 

N 

The  posteriori  density  over  A  is  computed  from  the  a  priori 
density,  observations  taken  from  the  Markov  chain,  and  Bayes* 
formula.  Let  x  =  {x  ,  x.,...,x  }  be  a  sequence  of  states  observed 
from  the  N  state  Markov  chain  where  xQ,  the  initial  state,  is  known. 
The  probability  of  observing  x  is  given  by  P  p  o  . 

1*  X  X.  X.Xai  i  f  x  «x 

j  1  12  n-1  n 

where  p  is  the  probability  of  making  a  transition  from  state  i 

l  i+1 

to  state  j  in  sequence  x  .  Let  f.:  denote  the  number  of  transitions 

“tl  lj 

observed  from  state  i  to  state  j  in  sequence  x  .  The  NxN  matrix 
F  =  [f.j]  is  called  the  transition  count  matrix  of  the  sample  x^. 

The  conditional  probability  of  observing  xr  given  a  P  e  is 
denoted  by  1(3^  |P)  where 

1(2jp)  =  P**  ,•••»*  ,x 
o  1  n-1  n 

N  f 

=  n  (Pii)fij 

i= 1  1J 


The  function  l(x  |p)  is  called  the  likelihood  function. 

— n  • 

N 

The  posteriori  density  over  A  ,  denoted  by  ?(P|x  ),  is  com¬ 


puted  using  Bayes*  formula 


5(P|x») 


k^Ip)  yp) 

[n  ><ijp>  50(pl  dp 


1  _  ,  vm. .+f. .-1 

7* - ; -  n  (Pii)  ij  ij 

/n  »(xn|p)?0(p)dp  i=l  'J 

'A  i=j 


i 


Thus,  UP\xJ  has  the  same  form  as  the  matrix  beta  density  with 
parameter  M+F.  Since  f  5(Pjx  )dP=l,  ?(P|x  )  has  the  form 

«Pl3Sn'  -  4  <pl“^>  <3-3) 

Martin  [9]  proved  that  as  the  number  of  observations  goes  to  infinity 
the  density  f^  (PjM+F)  will  concentrate  on  one  P  c  A**. 

3. 4  The  Posteriori  Density  fp(tjM+F) 

This  section  deals  with  the  transformation  of  fJjL(P|M+F)  to 
fp(t|M+F). 

N 

Given  the  matrix  beta  density  over  A  , 

=  hltel  I  l?i+li)‘  •  •  hN^N  ^  -N+4P 

N 

the  corresponding  density  over  T^ 

fp(l!  M+F)  -  ^1^—1  ^ I ^N+— n) 

is  derived  by  transforming  the  factored  density  hj(£jjna.+f^)  to  the 
factored  density  instead  of  transforming  fj^g(P|  M+F) 

to  fp(t_lM+F)  directly.  A  basis  for  E**  is  given  by  x'  =  |x>  n0|  • 
Transition  probability  vectors  jk  have  the  following  representation 
in  basis  x' 

( sil *  *i2»  •  •  •  *  siN~l’  ^  i=l, . . . ,  N 
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1 


c 


f 


c 


c 


c 


0 


t 
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The  transformation  that  takes  N  vectors  in  the  natural  basis  to  N 
vectors  in  basis  x'  is  the  NxN  matrix  V' 

v'  =  [v;.T] 

Therefore, 

(*il»  •  *  •  *  i )  =  (Pjj#  •  •  • »  Pjjj)  V' 

=  . <E4'Zn-1>' 

The  Jacobian  of  this  transformation  is  the  NxN  matrix  J  where 


The  density  |  is  given  by 

fi(^lSi+iLi>  ^(ILo+liUlm.+f.)  (3.4) 

Therefore 

fp(t_|M+F)  =  n  hil^+^Ulm.+fj)  (3.5) 

Example  3.  2:  Consider  a  two  state  Markov  chain.  Density 
f^g(P|M+F)  is  given  by 

4e<P!M+F>  =  hl<Eil-5l+*i> 

where 


f 
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r(M.+F.)  m..+fi 

hilPjm.+f.)  =  r<milt<11)r<m12ti12)  Pil  Pi2  ‘=>.2 


The  density  over  *p(tjM+F)  is  given  by 


fp(i!M+F)  =  f1{£1jnj1+£1)  f2(s2|m2+£2) 


where 


r(M,+F.)  /  s4\mn+f.. 

=  — - — - (1-4)  K 


(3.6) 


^jmi2+fi2 


i=l,2 


det  [j  ]  =  det 


=  41 


\  ■  ■  i 

L-lA/Z  Ujl\ 


3.  5  The  Posteriori  Density  fw(»jM+F) 

The  posteriori  density  over  TN  fn(s|M+F)  is  computed  from 
the  posteriori  density  over  tJJ  fp(t_lM+F)  using  the  results  of 
Chapter  2.  Density  fn(s|M+F)  is  given  by 

y«|M+F)=  \ _ 5^=? 

Oft  •  •  •  •  wft  -  _  | 

1  N-l  a=s 
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Consider  the  special  case  of  a  two  state  Markov  chain.  Density 
fnfsjM)  is  computed  by  substituting  Expression  3.6  with  F  =  [o]  into 
Equations  2. 24  and  2. 25, 

JL  /  v  m,  ,-l 

k'(M)  J 


fn(o|M) 


where 


m 


21 


V° 


‘2  J 


/•(</!- »2)\  m*2  ‘ 

V?-./  \jz )a'2'  0<t- 


*/Z72 


m,,-l 


•'SjsO 


m22'1 


^-,(7?  -.)  j  »  ^  d>j  £<§  d 


k'(M)  =  k(M)  ^ 


The  above  expression  can  be  manipulated  to  get 


i-  k'(M) 


/_i_\mi2  (JL)1  “!> 
\^-s/  \^-»/ 


-1 


j=0 


W’ 


y»|M) 


k'(M) 


m22-1 

2  *i<8K 

j=0  J  J 


where 


m21+m12_1 


5s 


=  £ 


i=  0 


“iYij 


-  JLr 

l  1  t  .  I 


+j+m22 


ij  l+j+m22 


m12+m2rl 


£  V*  =  ^  '  •*) 


m21+m12 


i=0 


V . , .  j/z-tjz.,  yv 

L  M®)»2  l — b+bz) 
i=o  *  '  ' 


“hr1 


v  E  9«Ti 


9..  = 

Tn  i 


i+j+m21+m12 


ij  i+3+m21+m12 
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I,  0<s<  f/JT/2 

(3.7) 


1 

1 


/ 


mjj-l 


m„-l 


“  •  ***11 
Ti*i  =  (</*  -  *i)  u 


m2Z'1 


C 


Example  3.  3;  Consider  a  two  state  Markov  chain.  The  a  priori 
density  is  matrix  beta  with  parameter  M  given  by 


M  = 


3  2 
2  2 


Oesnity  f  (s|M)  as  specified  in  Expression  3.7  is 


fn(.|M)  = 


0  <  8 


<A 


<  8  < 


Q 


Density  f^sjM)  is  drawn  in  Figure  3.  2. 

Example  3.  4:  Consider  a  two  state  Markov  chain  with  tran¬ 
sition  matrix 

r0. 8  0.2 

0.5  0.5 


P  = 


The  steady  state  probability  vector  tt  is 


( 
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TT  =  (5/7  5/7) 


and  its  representation  in  Tg  i*  the  scalar  s, 


s  =  0.404 


The  a  priori  density  over  P  is  defined  as  matrix  beta  with  parameter 


3  2 
2  2 


The  transition  count  matrix  F  was  computed  from  observations  taken 
from  a  computer  simulation  of  the  above  defined  Markov  chain. 
Matrix  F  at  100  transitions  was 


•  m 

74  16 

16  19 


and  at  250  transitions  it  was 


] 


141  35 


34  40 


The  posteriori  density  f^sjM+F)  was  computed  by  numerically 
integrating  Equation  3.  7  with  M  replaced  by  M+F.  The  resulting 
densities  are  drawn  in  Figure  3.  3.  The  posteriori  density  is  indeed 
concentrating  on  s  =  0.  404. 


i 


Example  3>  5:  Consider  another  two  state  Markov  chain  with 


matrix  P  given  by 


Vector  tt  is 


0. 9  0. 1 

0.  9  0. 1 


(0.9  0.1), 


and  its  representation  in  T ^  is  given  by  the  scalar  s  where 


s  =  0. 1414 


The  a  priori  density  is  specified  as  matrix  beta  with  parameter  M, 


r  i 

3  2 

2  2 

•  « 


Matrix  F  at  50  transitions  was 


46  5 

5  2 


and  at  100  transitions  it  was 


88  9* 

9  2 


Density  ^(sjM+F)  is  drawn  in  Figure  3.4,  The  probability  mass  is 
concentrating  at  s  =  0.1414. 
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Chapter  4 

APPLYING  BAYESIAN  DECISION  THEORY 

In  this  chapter  there  are  K  decisions.  Each  decision  specifies 
a  unique  transition  matrix  and  reward  vector.  The  transition 
probabilities  are  unknown.  In  order  to  apply  Bayesian  decision 
theory  the  states  of  nature,  a  priori  density,  observation,  and  loss 
function  have  to  be  specified.  With  these  elements  the  risk  associ¬ 
ated  with  each  decision  can  be  computed.  The  decision  maker  chooses 
the  decision  that  minimizes  his  risk. 

4. 1  Elements  of  Bayesian  Decision  Theory 

The  states  of  nature  was  defined  as  the  set 

0.  {an.  a” . a£} 

where  A?*  =  1 the  set  of  all  possible  transition  matrices  under 
decision  i.  An  element  «  of  set  Ois 

U)  =  (P^iPgi  •••*Pj^) 

where  P^  is  the  NxN  transition  matrix  under  decision  i. 

The  a  priori  density  is  defined  as  the  product  of  matrix  beta 
densities  with  parameter  M*  i  =  1 , . . . ,  K. 


1 


«o<*»  *  fMB(PllMl»  ^B!P2|M2)...£^(pK|MK).»  .  n 


The  observation  x^  is  given  by  the  sequence, 


where 


=  14 ,  x2 . 41 

'"*1  -n2  -”k! 


n=En  i 


and  is  the  sequence  of  states  observed  under  decision  i, 


8  K1 x* . x*,l 


where 


1  <xj<K 


1  =  1 . K 

j  °  1 » •  •  1 1  n. 


The  posteriori  density  is  derived  from  Equa'  ’  in  1.  3, 


5< ® I V  ■  £m6(Pi  Im’+F1).  .. f” ,|PK | MK+FK) 


The  loss  function  L(i  |  a)  was  defined  in  Equation  1.1, 


L(i  1  a)  =  max  j<  £(Pj).£?  >  -<£(p^)>£i>} 


To  select  a  decision  after  a  finite  number  of  transitions  a  risk  is 
computed  for  each  decision.  The  decision  maker  selects  the  decision 
that  minimizes  the  risk. 

There  are  two  problems  in  using  this  technique  to  select  a 
decision.  First,  a  sampling  strategy  has  to  be  specified.  The 
sampling  strategy  involves  the  number  of  transitions  recorded  under 
each  decision  and  manner  in  which  one  decision  is  switched  to  another. 
Second,  a  stopping  rule  has  to  be  specified.  State  transitions  cannot 
be  observed  forever.  Some  rule  that  indicates  when  enough  informa¬ 
tion  has  been  collected  is  needed. 

In  the  case  where  the  decision  process  makes  a  finite  number 
of  transitions  the  sampling  strategy  is  crucial.  Two  goals  must  be 
kept  in  mind  when  sampling.  First,  the  information  gained  through 
sampling  should  be  maximized  and  second,  the  payoff  should  be  maxi¬ 
mized.  In  the  case  where  the  decision  process  makes  an  infinite 
number  of  transitions  the  sampling  strategy  is  designed  to  maximize 
the  information  while  neglecting  the  payoff  during  the  finite  sampling 
period. 

In  this  dissertation  the  decision  process  makes  an  infinite 
number  of  transitions.  Since  the  central  issue  is  computing  the  risk, 
the  sampling  strategy  adopted  here  is  simply  to  sample  equally  under 
each  decision  before  selecting  the  risk  minimizing  decision.  A 
stopping  rule  is  not  specified, 

The  risk  is  defined  as  the  expected  loss, 


p(i)  =  /max  |<o(Pj),rj>  -  <oJP.),ri  >}?(u)lxn) 
^  J 


In  Chapter  1  it  was  shown  that  the  risk  minimizing  decision  k* 
minimizes  the  function  T|x(k), 

1)l(k*)  =  min  |  T)x(k)| 

where 

lf(k)  =  E(Ak'xn)  -  E(A.|xn)  (1.6) 


Substituting  Equation  3.  8  with  decisions  into  Equation  1.6  gives  the 
desired  minimizing  function, 


T1  (k)  =  <tt  ,  rk-rx  >  +  f  <w(t)  U,rk>  fJt  |Mk+Fk)  dt 

o  -  jrN  ~ 

aN 

"  /  < U.r1  >  fp(t_|  mSf1)  dt_r 


(4.1) 


nN 

N 


k  =  1 . K 

1  <  i  <  K 


4.  2  Howard1 8  Toymaker  Example 

The  following  example  was  used  by  Ron  Howard  [7]  to  illustrate 
the  procedure  used  in  selecting  the  decision  that  maximizes  the  gain. 
An  example  of  a  Markov  decision  process  can  be  thought  of  as  the 
toymaker's  process.  The  toymaker  is  involved  in  the  novelty  toy 
business.  He  may  be  in  either  of  two  states.  He  is  in  the  first  state 
if  the  toy  he  is  currently  producing  has  found  great  favor  with  the 
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public.  He  is  in  the  second  state  if  his  toy  is  out  of  favor.  Suppose 
that  when  he  is  in  state  1  there  is  p^  ^  percent  chance  of  his  remaining 
in  state  1  at  the  end  of  the  week  and,  a  1  -  Pjj  (=Pj£)  Percen*  chance 
of  a  transition  to  state  2.  When  he  is  in  state  2  he  experiments  with 
new  toys,  and  he  may  return  to  state  1  after  a  week  with  probability 
p£j  or  remain  unprofitable  in  state  2  with  probability  p£2*  A  tran¬ 
sition  diagram  of  the  system  showing  the  states  and  transition  proba¬ 
bilities  in  graphical  form  is 


The  transition  matrix  P  is 


P  = 


given  by 

‘Pll 

P12 

P21 

p22 

When  the  toymaker  has  a  successful  toy  he  earns  r^  units  for  that 
week,  and  if  his  toy  is  unsuccessful  he  earns  r ^  units  for  that  week. 

Suppose  now  that  the  toymaker  has  other  courses  of  action 
open  to  him  that  will  change  the  probabilities  and  rewards  governing 
the  process.  When  the  toymaker  has  a  successful  toy  he  may  use 
advertising  to  decrease  the  chance  that  the  toy  will  fall  from  favor. 
However,  because  of  the  advertising  cost,  the  profits  to  be  expected 
per  week  will  generally  be  lower.  To  be  specific,  suppose  that  the 
probability  distribution  for  transitions  from  state  1  will  be 
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£j  =  (0.  8  0. 2)  when  advertising  is  employed,  and  that  the  corre¬ 
sponding  reward  will  be  r  j  =  4.  The  toymaker  has  two  alternatives 

a 

when  he  is  In  state  1:  He  may  use  no  advertising  or  he  may  adver¬ 
tise.  These  alternatives  will  be  labeled  1  and  2,  respectively.  Each 
alternative  has  its  associated  reward  in  state  1  and  probability  dis¬ 
tributions  for  transitions  out  of  state  1.  Alternatives  will  be  indicated 
by  superscript.  Thus,  for  alternative  1  in  state  1,  £*  =  (0.5  0.5), 
rj  =  6;  and  for  alternative  2  in  state  1,  £j  =  (0.  8  0.  2),  r^  =  4. 

There  may  also  be  alternatives  in  state  2  of  the  system.  In¬ 
creased  research  expenditures  may  increase  the  probability  of 
obtaining  a  successful  toy,  but  they  will  also  increase  the  cost  of 
being  in  state  2.  Under  alternative  1,  a  limited  research  alternative, 
the  probability  distribution  is  £^  =  (0.  4  0.  6)  and  the  reward  is 

r 2  =  -3.  Under  the  research  alternative,  alternative  2,  the  proba- 

2  2 

bility  and  reward  distribution  is  £g  =  (0.  7  0.  3)  and  «  -5. 

The  alternatives  for  the  toymaker  are  presented  in  Table  4.  1. 

A  decision  is  defined  as  a  vector  of  alternatives  in  each  state. t 
Therefore,  there  are  four  decisions.  Decision  one  is  defined  as 
alternative  1  in  state  1  and  alternative  2  in  state  2,  and  so  on.  Each 
decision  specifies  a  transition  matrix.  For  example,  decision  3 
given  by  alternative  2  in  state  1  and  alternative  1  in  state  2,  gives 
the  following  transition  matrix, 
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The  corresponding  steady  state  probability  vector  is 

n3  =  (0.667  0.333) 

and,  the  gain  or  expected  payoff  is 

A3  =  2/3(4)  +  l/3(-3) 

=  1.67 


The  steady  state  probability  vector  and  gain  for  each  decision  is 
listed  in  Table  4.  2.  If  the  decision  maker  has  perfect  knowledge 
of  the  transition  probabilities  he  will  select  decision  4,  the  decision 
that  maximizes  his  expected  payoff. 

Now,  assume  that  the  transition  probabilities  are  unknown. 
The  states  of  nature  are  given  by 

n=  {a2,  a2,  a2,  a2} 

The  a  priori  density  is 

where  the  "parameter"  M  is  selected  as 


2 


M  = 


2 


2 

2 


The  Markov  chain  defined  above  was  simulated  on  the  computer  and 
was  observed  using  the  following  sampling  strategy.  Ten  transitions 
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are  recorded  under  each  decision.  The  decisions  are  switched  from 

decision  1  to  decision  2  to  decision  3  to  decision  4  to  decision  1 ,  and  ® 

so  on.  The  transformed  risk  function  if  (i)  is  computed  from 

Equation  4. 1.  The  values  of  T|^(k)  k  =  1, . . . ,  4  are  plotted  in 

Figure  4. 1  every  ten  transition  sequence.  The  results  show  that  ® 

the  risk  minimizing  decision  k*  is  decision  4,  the  decision  that 

maximizes  the  gain. 

0 

a 

9 

3 

i 

4 

I 
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Table  4. 1 

Transition  Vectors  and  Rewards 


Table  4. 2 

Steady  State  Probabilities  and  Gains 


Chapter  5 

BAYESIAN  DECISION  THEORY  WITH  THE  UNCERTAINTY 
OVER  THE  STEADY  STATE  PROBABILITIES 


Once  density  fp(t|M+F)  is  specified,  the  transformed  risk 
function  Tl^k)  is  evaluated  by  integrating  in  the  set  T^j.  As  N  gets 
large  this  task  becomes  increasingly  more  difficult  and  time  con¬ 
suming,  As  an  alternative,  the  a  priori  density  could  be  placed  over 
the  steady  state  probabilities  rather  than  the  transition  probabilities. 
This  approach  greatly  reduces  the  computational  exercise.  In  this 
case  the  states  of  nature  are  defined  as 


0  =  {nN,,,,,nN} 


and  the  a  priori  density  is 


s0<«»  s  £vb(- 


where  fyg(TT£|m^)  is  the  vector  beta  density  with  "parameter"  m£  and 
is  given  by  h^jjJm1)  in  Equation  3. 1  with  replaced  by  n1. 

The  posteriori  density  is  also  vector  beta  with  "parameter" 
mVf.1  i  =  1 . K 

SHxJ  *  £vp<H1|n?1+f1)...f^(TTKimK+fK) 

Observation  x^  is  the  same  as  defined  in  Chapter  4.  Vector  f 1  is 

tk  S 

called  the  frequency  count  vector.  The  j  component  of  f  is  the 
number  of  times  state  j  is  observed  under  decision  i  in  sequence 


The  loss  function  for  this  case  is  also  the  same  as  defined  in 
Chapter  4.  The  decision  maker  chooses  the  decision  k*  that  mini- 
mizes,  ^(k)  .  ‘..’he  expected  value  of  the  gain  £(&,  lx  )  is  computed 

K  1 

as 

E(Ak|xn)  =<E(rik|xi),rk> 

The  expected  value  of  n  is  given  by  the  properties  of  the  matrix 
beta  density  in  Equation  3.2, 


where 


E(Ek|*  )  =  5-  (mk+ik) 
k 


\  +  # 


Therefore,  1)  (k)  is  given  by 


H4(k)  - 


1  .k-k  k 
— —  <m  +i  ,  r  >  - 

Qf  -r'  —  — 


<mx+f l,  r1  > 


a,  -  - 


(5.1) 


This  expression  is  easy  to  evalute.  No  integration  operation  is 

required.  Vectors  m*  i  r  1 . K  are  specified  when  the  a  priori 

density  is  defined,  and  vectors  f_l  i  =  1 , . . . ,  K  are  defined  by  the 
sequence  observed  from  the  Markov  chains. 

The  critical  assumption  made  in  this  approach  is  that  the 
Markov  chain  being  observed  is  in  steady  state.  However,  the 
Markov  chain  under  observation  may  not  be  m  steady  state.  Since 
the  transition  probabilities  are  unknown,  the  probability  distribution 
over  the  states  is  unknown. 
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Suppose  tnct  the  Markov  chain  under  consideration  has  been 
operating  under  decision  1  for  a  large  number  of  transitions.  Assume 
that  the  probability  distribution  over  the  states  is  tt*  ,  the  steady 
state  probability  distribution.  Now,  decision  1  is  switched  to  decision 
m.  The  probability  distribution  over  the  states  after  n  transition  is 


nm(n)  =  it1 


If  observations  are  recorded  before  nm(n)-»nm  then  the  posteriori 
distribution  over  the  states  of  nature  will  not  accurately  reflect  the 
knowledge  of  the  steady  state  probabilities.  The  question  asked  13: 
How  many  transitions  are  required  before  nm(n)  is  "close  enough" 
to  TTm  so  that  observations  can  be  recorded?  This  question  will  be 
answered  for  special  classes  of  Markov  chains  in  Chapter  6,  but 
under  conditions  of  perfect  knowledge  of  the  transition  probabilities. 
Since  the  transition  probabilities  are  unknown,  these  results  cannot 
be  used.  Aside  from  these  theoretical  problems,  this  approach  has 
great  practical  appeal,  especially  for  Markov  chains  with  large 
numbers  of  states. 

Example  5.1:  The  following  two  state,  two  decision  example 

is  presented  to  indicate  that  the  approach  presented  in  this  Chapter 

1  2 

does  give  good  results.  The  a  priori  "parameters"  m  and  m  are 
given  by 


1  2  ,, 
rri  -in  =  (2  2) 


The  transition  matrices  simulated  were 
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0.6J 

• 

0.5 
0.3 

m 

and  the  rewards  were 

r1  =  (6  -3) 
r2  =  (6  -5) 

The  sampling  strategy  chosen  was  to  sample  twenty  transitions  under 
each  decision.  When  one  decision  was  switched  to  another,  the  first 
ten  transitions  were  not  recorded  to  allow  the  probability  distribution 
to  near  steady  stavr  The  transformed  risk  V(i)  i«  given  by 

*n  1<i>  =  o 

1  (2)  =  —■  <m1+f 1 ,  r*  >  -  <m2+f  2,  r2> 

1  “  ~  ~  a2  ~  - 

1 

Variable  T)  (2)  is  plotted  in  Figure  5.  1.  As  the  number  of  observa¬ 
tions  increased,  11*(2)  approached  its  true  value  of  -0.42. 


Pl  = 


P2  = 


0.5 

0.4 

0.5 

0.7 


Chapter  6 

CONVERGENCE  PROPERTIES  OF 
TWO  STATE  MARKOV  CHAINS 

Given  an  ergodic  transition  matrix  P  and  an  initial  state  proba¬ 
bility  vector  tt(o)  the  state  probability  vector  after  n  transitions, 
rr(n),  is  given  by 

7T(n)=n(o)Pn  (6.1) 

As  n  approaches  infinity,  vector  n(n)  asymptotically  approaches  the 

N 

steady  state  probability  vector  n  .  Using  sets  T^  and  T^  ,  the  con¬ 
vergence  rate  of  TT(n)  to  tt  for  two  state  Markov  chains  can  be  stated 
explicitly. 

In  order  to  determine  convergence  of  n(n)  to  tt  both  probability 

vectors  must  be  transformed  to  vectors  in  T^.  Vector  n(n)  can  be 

written  as  tt  +  s(n)  U.  For  convenience,  vector  tt  is  defined  as  the 
— o  —  — o 

steady  state  probability  vector  tt.  Using  this  definition 

TT(n)  =  tt  +  s(n)  U 

and  vector  n(n)  -  tt  becomes 

n(n)  -  tt  =  £(n)  U  (6.  2) 

The  next  step  is  to  evaluate  vector  £(n).  From  Equation  6.  1 

n(n)  =  tt  +  £(n)  U 

=  (tt  +  s(o)  U)  Pn 
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Preceding  page  blank 


or 

s(n)  =  s(o)  UPnV  (6.  3) 

Matrix  P  can  be  written  as 

P  =  P0  +  SU 

Similarly,  matrix  Pn  can  be  written  as 

Pn  =  PQ  +  S(n)U  (6.4) 

Substituting  Equation  6.4  into  Equation  6.  3  gives 


s(n)  =  s(o)  UPQV  +  s(o)  US(n)  UV 
=  s(o)  US(n) 


(6.5) 


Matrix  S(n)  can  be  evaluated  by  manipulating  the  identity 


Pn**  =  Pn  P 


(6.6) 


The  i4^  row  of  Expression  6.  6  satisfies 


E?+1  =l£p 


or 


and 


Z  +  J®i(n+1)  U  -  (tt  +  £.(n)  U)P 


s^n+l)  =  s.(n)  UPV 


Therefore 


S(n+1)  =  S(n)  UPV 


(6.7) 
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Substituting  the  identity  PV  =  S  +  PQV  into  Equation  6.7  gives 

S(n+1)  *  S(n)  US  +  S(n)  UPQV 
*  S(n)  US 


or 

S(n+1)  *  S(l)  (US)n 
s  S(US)n 


(6.8) 


Substituting  Equation  6.  8  into  Equation  6.  5, 

s(n)  =  s(o)  US  (US)11”1 

=  s(o)(US)tt  (6.9) 

Substituting  Equation  6,  9  into  Equation  6. 2  gives  the  desired  ex¬ 
pression, 

Ti(n)  -  n  =  s(o)  (US)n  U  (6.10) 

Consider  a  two  state  Markov  chain.  "Matrices"  S  and  U  are 
given  by 


Substituting  these  expressions  into  Equation  6, 10  gives 

n(n)  -  n  s  jj(o)  <£.n>n  u 

=  ( <s ,  u  >n  8(o)  )  u 


l 
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Expression  <s,u>  s(o)  is  given  by 


<£,U>“  s(o)  = 


(V1)  <-<o,‘-- 


where 


■  *  (!) 


Therefore,  Equation  6. 10  for  two  state  Markov  chains  becomes 


n(n)  - 


/*2  -  *An 

tt  =  I -  )  <rr(o)-n,  v>  u 

\  ) 


(6.11) 


The  rate  of  convergence  of  n(n)  to  tt  can  be  determined  by 
finding  the  integer  N  such  that  |  | TT(n) -tt  J  |  is  less  than  some  small 
number  c  for  all  n  >  N.  The  norm  for  two  state  Markov  chains  is 


|TT(tl)-TT  |  |  = 


2  ‘  D1 


|<tt(o)-tt,  v  > 


since  ||u||  =  1.  Letting  |  !n(n)  -  tt)  |  =  c  and  solving  for  n  results 


in  the  expression 


l0g  [  !<tt(o)-tt,  v>|  j 


S2  81 


(6.12) 


The  expression  for  rr(n)  can  be  used  to  investigate  the  con- 

T1  00 

mergence  of  transition  matrix  P  to  the  limiting  matrix  P  .  For  two 
state  Markov  chains,  the  two  transition  vectors  of  Pn,  £^,  are 


written  as 


El  =  H  +  ^(n)  U 
J>2  =  H  +  ®2<n)  U 

M  Q)  2 

Convergence  of  P  to  P  in  set  A  is  equivalent  to  the  convergence 

2 

of  vector  tjn)  to  a  =  (S,  S)  in  set  T2 

t(n)  *  (Sj(n),  *2(r)) 

8  =  <(£‘H0)'X>  =  0 

where  the  expressions  for  scalars  s^(n)  and  s2(n)  are  derived 
rather  simply  from  Equation  6.9» 


where 


•j(n)  *  •j(o)  (US)n 

(6.13) 

•2(n)  =  «2(o)  (US)n 

(6.14) 

Bl(°)  =  Bi  =  <(Ei  *  £)>v  > 

(6.15) 

S2(°)  =  S2  =  <(£2  -  TT),V  > 

(6.16) 

Equations  6. 13,  6. 14,  6. 15,  and  6, 16  combine  to  form  vector  tjn), 

t_(n)  =  t_(US)n 


or 


(6.17) 
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Example  6.1:  Consider  a  two  state  Markov  chain  with  tran¬ 
sition  matrix 


Vector  tt  is  defined  by 
— o 


n,  =£  =  d/3,  2/3) 

Therefore,  set  T2  is  defined  by 

T2  =|s  c  E'  |  -  ^  <  s  <  /2/3J' 
Vector  t_(n)  in  Equation  6. 17  is 

i<n>  =({)"(#■¥) 
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Chapter  7 

SUMMARY  AND  RECOMMENDATIONS 


The  risk  minimizing  decision  k*  was  shown  to  also  minimize  function 
if  (k)  where 

tli(k)  =  E(Ak|xn)-E(4i|xn) 


Therefore,  decision  k*  satisfies 


where 


tm  =  min|E(Ak|^1)-E(Ai|^1)| 
lc 


tn  =  tn  x  •  •  *  xTn  tlme8^ 

TN={i«EN-1|1uS£-TTo,£«nN| 
tt  =  e. 

-O  —1 


row  of 


66 


.  til  £  --IS. 

i  row  of  F 


f*  = 

—l 


N 

hi<£0+iiu^iC+ih  =  u. 


m^.+fK-l 
X  0  ij 


The  objectives  of  this  research  were  met.  A  simple  solution 
to  the  Markov  decision  problem  with  uncertainty  has  been  derived. 
However,  the  problem  is  not  completely  solved.  The  following  is 
a  list  of  topics  that  require  future  research. 

1.  The  problem  of  selecting  an  "optimal"  sampling  strategy 
and  an  "optimal"  stopping  rule  is  a  candidate  for  future 
research.  Martin  discusses  this  problem  at  length. 
However,  his  results  do  not  appear  to  be  amenable  to 

a  practical  application.  It  is  possible  that  this  problem 
conld  be  successfully  analyzed  by  using  the  framework 
developed  in  this  dissertation. 

2.  In  this  paper  the  Markov  decision  process  under  con¬ 
sideration  was  an  infinite  stage  process.  The  case  where 
the  Markov  chain  makes  a  finite  number  of  transitions 
should  be  analyzed.  Here  the  sampling  strategy  will  be 
of  primary  importance  because  two  goals  will  be  present 
during  the  life  of  the  process.  One  is  to  maximize  the 
payoff  and  the  other  is  to  maximize  the  information.  In 


the  infinite  stage  process  the  payoff  was  not  an  issue 
since  the  Markov  chain  would  make  an  infinite  number 
of  transitions  after  a  decision  was  selected. 

3.  The  case  where  the  uncertainty  is  placed  over  the 

steady  state  probabilities  was  discussed  in  Chapter  5. 

This  approach  simplifies  the  computations  considerably. 
However,  errors  are  present  because  the  decision  maker 
does  not  know  whether  the  Markov  chain  is  in  steady 

state.  If  some  means  of  approximating  the  time  when 

> 

steady  state  is  "nearly"  reached  were  found,  this  approach 
might  be  more  practical  for  a  Markov  chain  with  a  large 
number  of  states. 
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